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(54) Methods of refining descriptors 

(57) A database containing data items, such as 
images, text, audio records or video records, in both 
summary and complete forms, is searched by reference 
to descriptors associated with the data items, in 
response to user requests. The search result contains 
the summary form of the selected items. Responses 



from users indicative of the utility of the search results 
(implicit in the users' subsequent actions in requesting 
the complete forms of specific selected items) are used 
to refine the descriptors to alter subsequent search 
results. 
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Description 

Technical Field 

5 [0001 ] This invention relates to methods of refining descriptors, for example such as are used for retrieving data items 
from databases. 

Background Art 

io [0002] A major obstacle to the efficient retrieval of data is the way they are indexed (i.e. to select descriptors or key- 
words). Currently there are two common ways of indexing: 

1 . The use of a automatic indexing tool to extract words from text documents or recognize forms and elements in 
images, videos, etc. This is based on artificial intelligence (Al) techniques and has the limits that this technology 

15 offers. 

2. One or more people does the indexing manually after a close analysis of the data. This is usually accurate but 
reliant on the vocabulary of the indexer and their perception of the data. (It may be very subjective for images, for 
instance). It is also time-consuming. 

20 [0003] Both of these techniques provide a set of indexing keywords or descriptors which are static, and which very 
often belong to a vocabulary that is inconsistent and limited. However, people querying the system in effect provide pos- 
sible keywords in their queries. The keywords in the queries may not be existing descriptors but they are relevant to the 
data. Currently this information is left unused and forgotten by the system once the user quits the system. As a result, 
if the indexing keywords are inappropriate, nothing can be done to improve them even if some people may provide good 

25 indexing terms as they search. 

[0004] If the terminology commonly used changes over time (for example, one technical term becomes superseded 
by another), then it becomes necessary to redo all the indexing which is undesirable, especially as databases become 
bigger and bigger. 

[0005] Thus there is a need to describe data so that they will be more easily searchable by the majority of the com- 
30 munity. 

Disclosure of Invention 

[0006] According to one aspect of this invention there is provided a method of refining descriptors associated with 
35 data items to enable retrieval thereof, comprising the steps of: 

storing said data items in both a summary form and in a complete form; 

receiving a search request from a user for selection of data items, said request incorporating at least one descrip- 
tor; 

40 sending the user a search result comprising only the summary form of data items selected in accordance with said 
search request; and 

using a response of the user requesting the complete form of a selected data item in the search result to guide 
modifications of the descriptors. 

45 Brief Description of Drawings 

[0007] Methods and apparatus in accordance with this invention for refining descriptors associated with data items 
will now be described, by way of example, with reference to the accompanying drawings, in which: 

so Figures 1 and 2 are used in describing different indexing systems; 

Figure 3 is a block schematic diagram of a system for implementing the invention; 

Figure 4 illustrates adaptive indexing of an image; 

Figure 5 shows various stages involved in managing inclusion of a item in or omission of that item from a set 
of descriptors; 

55 Figure 6 illustrates operation of a technique for adjusting keyword weights; and 

Figure 7 shows an exponential function used in deciding weighting factors applied to descriptors. 
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Best Mode tor Carrying Out the Invention. & Ind ustrial Anolinahility 

[0008] Keywords or descriptors available in a database system to lead a searcher to a particular item (such as a doc- 
ument or .mage) are often different from the descriptors which best describe the content of that item. This is what makes 
irrforrnation retrieval sometimes inaccurate and unsuccessful. In a traditional retrieval system which provides a static 
index by algorrthmic means, the index can be represented by adata item-keyword, sparse matrix M of fixed dimension- 
ality (see Figure J) An entry ,n the matrix M(d, i) is a number and has the meaning "the data item d has been indexed 
wrth a keyword , Binary information or keyword frequencies can be stored and this leads to traditional binary or prob- 
abihstic retrieval systems. The main weakness of the static indexing approach is a user-system vocabulary mismatch 
(need for thesauri, stemming and fuzzy matching) and a need for mapping a user query into the indices that the system 

[0009] It would be desirable to capture the keywords provided by users while searching, and associate them with the 
items the users retrieved. In this way it can be ensured that items are usefully associated with the keywords people 
actually use to retrieve them. K M 

[0010] An example of the invention is referred to herein as adaptive indexing, where the items are indexed by refer- 
ence to contributions (explicit or implicit) from the entire community as people search the data. With adaptive indexing, 
the system is capable of capturing the keywords entered by the user community. The information captured from the user 
interaction during the process of searching and browsing the results leads to automatic thesaurus build-up gradual 
convergence of the system's keywords to the user population vocabulary and indexes which are always up-to-date The 
dynamic index could be visualised as a list of keywords (Figure 2) with the containing list enumerating all the data items 
and the contained lists enumerating all the keywords for a given data item. Keywords could also have scores attached 
rL£! m a f C °[ dm9 to their d °9 ree 01 relevance for a data item. The lists expand when a new keyword is entered 
[0011] in i this scenario, each item or piece of data has a set of associated dynamic descriptors or keywords, which 
are not static but which can change with time as a result of information from people searching. Each descriptor for a 
given piece of data has a weight that measures its relevance to that piece of data. The value of the weight is statistically 
determined by the searches. At an instant in time, the descriptor with the highest weight at that time is considered to be 
the best description of the piece of data because it was the most popular description of the piece of data given by peo- 
ple using the system. The descriptor with the smallest weight is not very relevant to the piece of data, and if its weiqht 
continues to decrease then at some point the descriptor may be removed (garbage collected) 
[0012] Feedback from users may be explicit (e.g. users provide comments on how useful or relevant the search result 
was) or implicrt (e.g^a system monitors whether people make purchases in relation to the content of search results) In 
Ik TIT descr,bed below wi,h ^rence to Figures 3 and 4 the feedback is implicit, i.e. the user does not know 
aboutthe learning process gang on in the system. Consequently, the system has to evaluate the associations that are 
implied by the user s actions. 

EL ^ZT g t0 Fi9Ure < 3 ' 8 US6r 10 0perat6S 3 COmputer terminal 12 to send search requests via a communica- 
tions link 14 (eg. in a computer communications network) to an input/output interface 16. The interface 16 forwards the 
search requests to a processor 18 which executes software program instructions stored in a memory 20 to search a 
store 22 holding data items and associated descriptors for indexing them. The program instructions may include for 
example, search engine functions for scanning tor data items associated wrth specified descriptors, and functions for 
to managing the associations with descriptors. ° 
[0014] Each data item (document, image, etc.) is held in the store 22 in two different forms: 

- a "summary form" where it is minimally described so that the user can quickly access its contents. In the case of 
images, the summary may be a "thumbnail" (a reduced-scale, low resolution version of the full image) with possibly 
an image title or some other information such as the photographer's or artist's name, or a image reference number 
quX eXammin9 * hiS summar * toe user can form an initial opinion on whether this item may be relevant to their 

" 3 u < T V 'f e i0r , m "' Whi ° h CO " tainS a " the necessar y information about the item to enable a decision to be made on 
whether it is relevant to the current query or not. For an image, for instance, this complete form would comprise the 
.mage at a good enough resolution to enable details of the composition as well as the quality of the image to be 
assessed. In a remote technical support system it could be the full history of a call for support by a user and the 
assistance and advice given (Question / Answer / Comments / Pointers to relevant documents etc. ..). ' 

EJ2r !° firS ! P T f r ideS a u qUery ' and ,he processor 18 responds wrth a list of summaries of items which 

appear potentially relevant. The user browses through this result list, and upon finding an item which appears from its 
summary to be similar to what is being sought, she will access the complete form of that item. The processor 1 8 under 
the control of the program in the memory 20. treats this choice as an implicit signal that an association has been made 
between the initial query (list of descriptors) and that item in the database. Consequently it updates the descriptors of 
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the selected item accordingly in the store 22. All the keywords in the query are associated with the selected item in this 
of wither or not they were already descriptors for that item - this is how the system appl.es new 

keywords. , 
[00161 Thus referring to Figure 4, a picture of a pig with a litter of piglets standing by a cluster of flowers may already 
bVindexed adding to the terms pig farm, family, floors and field. A user may enter a search query containing the 
?~ms oToMet farm countryside art family. Because countryside and piglet are not already present as descriptors 

Se^^iaV^th this item (such as pig and farm). If floors persists with zero weighting it is eventually removed 

miin* The^lutton of ^weights of the descriptors for a given data item is tailored by the interactions of the users. 
The more users associate a descriptor with a particular data item, the higher the resulting weight. If users desa.pt.on 
of the data item changes (through, for example, evolution of terminology, historical events, new er minology. a new date 
domain or a new set of users) the descriptors will evolve according to the majority opinion of the community of users 

[OOiaT As h this?£hnique relies on purely implicit indications, the possibility of some inappropriate associations cannot 
be avoided. For instance the user 10 may be seeking a picture of a lion eating its prey. She may ente. -the , query Jon 
eating prey" and the processor 18 returns a picture of an antelope resting in the shadow of a tree^Although the user ,s 
not interested in buying rights to use this image she finds it appealing and requests the complete form to see themiage 
Tn more detail out of pure curiosity. The keyword capture process implemented by the processor 1 8 w. refte* the 
actto^ by reinforcing or re-indexing this image of an antelope with the keywords "lion", "eating" and "prey", with hon 
perhaps becoming a new descriptor in the process. It is also possible to make an inappropriate assoc.at.on by assoc.- 
atina an item with a misspelled keyword. ... 
tOOl 91 However such inappropriate associations should have a minimal impact on the system, because an individua 
association does not change the keyword weighting very much. A newly associated keyword does not acqu.re mammal 
significance immediately after the first association is made. In other words, more than a s.ngle assoc.at.on "eeded to 
radically change the indexing; for example, in one implementation, new keywords are not used for searchunt.1 .ve asso- 
ciations with that keywori have been made. Consequently a misspelled keyword w.11 become a vahd descnptor for the 

data item only if it is a common misspelling. „ . . n .„ , K _ 

[0020] So far adaptive indexing has been described essentially as a process happen.ng when access is made to me 
complete form of a data item for preview/review. However it is also possible to introduce mutople teve Is oj -npad on the 
indexing For example, it is possible to reinforce the descriptors further if the user 10 decides actually to buy nghte to 
uTan 9 image This is equivalent to adding an extra .eve. of re.evance feedback that wouHl be more explicit and rt 
reduces the potential risks of purely implicit feedback. Explicit feedback could also be introduced* the end of the 
retrieval phase, especially for remote support systems where customer satisfaction is often monitored. 
[0021] Each association between a descriptor or keyword and a data item has a weight, which is a value between 0 
and 1 . This weight may be implemented in either of two ways: 

- data item tocussed: weights are associated with data items and normalisation is done relative to data items; this 
implies delining how a data item is described; 

- keyword tocussed: weights are associated with keywords and normalisation is done relative to a keyword; thrs 
implies defining what a keyword means. 

In the present embodiment keyword fbcussed weighting is used. This is mainly to ensure that specialist keywords which 
are very rarely used (but which are extremely good descriptors) nonetheless strongly influence the result of a ^uery If 
the data item tocussed approach were used, the weight of such rarely used keywords would be small compared to the 
weight of other, more common, keywords. Thus a query combining a common keyword and an unusual keyword would 
yield a result with many items matching the common keyword, possibly swamping the items matching the unusual 
though highly relevant descriptor. With the keyword tocussed approach the weight of this unusual descriptor is high, as 
the number of data items described by this specific keyword is small. 

[0022] This choice has the side effect of giving increased importance to popular data items (as far as the weightsare 
concerned). From a commercial viewpoint this is advantageous: in the case of images, for instance, there are often 
images which are particularly popular at one time, according to current fashion. 

[0023] The keyword weight is used to evaluate the importance of a keyword for a particular data item and also to rank 
the results of a user query. In the embodiment described herein tor each keyword three different weight values are dis- 
ss tinguished. which determine the status of that keyword: 

w k ° the initial weight when the keyword first enters the system; 

- w k User : the threshold for a keyword to become searchable (i.e. be taken into account in determining whether a data 
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item should be included in the result of a user query); 

w k GC the threshold below which a keyword becomes "garbage collected" (i.e. no longer used in assembling 
results for user queries). 

The specific calculation of each of these values depends on the adaptive indexing algorithm adopted. 

[0024] Further each keyword has a status, determined by these weight values, which reflects the influence of the 

users as implied by their reactions to query results: 



1 . Master keyword : the master keyword is provided by the content provider or by a professional indexer (this is the 
10 original descriptor). This keyword cannot be removed by the system, without the explicit consent of a supervisor. 

This is because some terms are very specific to the data item, or are even key descriptors, but are not frequently 
used because the average user (general public) is not familiar with them. Nonetheless their inclusion enables spe- 
cialists to access the data item quickly. 

2. User keyword : this keyword has been provided by a user (general public); it is searchable because a significant 
75 number of people have already associated this keyword with a particular data item. 

3. Candidate keywords : there are two different types of candidate keyword reflecting two different types of transition 
for a keyword. In either case they are not "active" (not searchable). They will be garbage collected when their weight 
falls below a given value (w k GC ) or become user keywords if their weight rises above a given value {w k Usor ). 

20 - candidate for User keyword : this is the initial status of a new keyword entering the system. This keyword is not 
yet searchable, because it could be misspelled or an inappropriate association as described above. This status 
reduces the risk of the search process being slowed by the presence of large numbers of "junk" keywords. 
However, this also makes the addition of a new keyword harder, as it has to be used in association with an 
existing keyword for a data item several times before it exceeds the threshold to become a User keyword itself; 

25 - candidate for Garbage Collection : this status is reserved for User keywords whose weight decreases to its orig- 
inal introduction weight (w k °), i.e. they became User keywords but were very rarely used afterwards. This could 
happen, for example, either by evolution of the vocabulary or by entry of an inappropriate keyword at some 
point in time. 

30 ■ [0025] Figure 5 shows the possible transitions between these different possible statuses, and the corresponding val- 
: ues for the weight w k ° which result in each transition. 

* [0026] Various different techniques for varying the value of the weight w k ° can be used. Two are descrfoed below. 
The first is a straightforward interpretation of simple probabilistic rules. The second one is more empirical and aims at 
forcing the weights to evolve following an exponential curve. 
35 [0027] In the first technique the weight is fixed for a given period of time. At the end of each period, the weight is re- 
evaluated according to the extent of association which has occurred during that period. The duration of a period is the 
only arbitrary parameter. It depends on the total number of data items and on the extent of use of the search system 
(number of queries per day for instance). At the beginning of each period, for each keyword k, two counters are set to 
zero: 

40 

C k represents the number of times the keyword has been associated with data items (irrespective of whether it was 

with different data items or many times with the same data item); 

C kf i represents the number of times the keyword has been associated with data item /. 



45 At the end of the period, the weight of the association between a keyword and a data item is defined by 



wm = (Cm / C k ) if C* is different from 0 
h>a,/ = 0 otherwise 

50 

In other words, the weight represents the probability that the data item / is indexed by the keyword k. Under these cir- 
cumstances, the starting weight w k ° for a new keyword will be proportional to MC k . The two other thresholds w k Usor 
and w k cc are arbitrary and will be the same for all the keywords. A disadvantage of this method is that the history asso- 
ciated with a weight is rather limited and very dependent on the activity of the search system, and more specifically on 
55 the extent of use of the keyword. 

[0028] The lists of keywords could be sorted according to normalised scores and divided into quantized intervals of 
fixed length in proportion to each keyword's probability of indexing a data item. Keywords would compete on the basis 
of weight to be promoted to the higher interval and would be moved down to the lower interval by more appropriate key- 
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words. Conditions can be specified for crossing interval boundaries to prevent keywords oscillating between intervals 
(see Figure 6). Keyword probabilities may be quantized to save storage (a byte gives 256 bins of length 0.004 which 
could be sufficient). 

[0029] In the second, exponential function, technique, the weight follows a curve composed of two exponential curves 
5 (see Figure 7). Thus increasing and decreasing the weights is reduced to a simple multiplication by a specific coeffi- 
cient. Each association of a data item with the same keyword will follow the same curve depending on the initial weight. 
[0030] For convenience and efficiency of computation it is preferable to store the value 



instead of w itself. Hereinafter v is referred to as the "Relationship Coefficient". In addition the following notations are 
used: 

Relationship Coefficient of association [i,k] 
Initial Relationship Coefficient for associations with keyword k 
User Relationship Coefficient for associations with keyword k 
GC Relationship Coefficient for associations with keyword k 
Number of data items indexed by keyword k in active associations 

20 

The following requirements can be easily ensured: 

weight limitation to represent a relevancy weight (i.e. w ik <= ]0,1 [ or v ik e ]-1 ,0D 

we can control the number N User of interactions needed for a keyword in an association to become an active key- 
25 word (i.e. the total number of associations of that keyword with that data item needed to have v ik > v k User ) 

we can control the number N GC of negative interactions needed for a keyword in an association to become an can- 
didate for garbage collection (i.e. the number of associations needed to have v ik < v k °) 

reversibility of updating (i.e. v ik increased once and decreased n k times returns to the initial value before the 
increase) 

30 

[0031] The weight-updating procedure requires an initial setting for the relationship coefficient. For each association 
between a keyword and a data item this relationship coefficient will be increased or decreased. In the case of a new 
association, it will be entered into the store 22, and if this new association is made with a new keyword, this keyword 
will also be entered into the store 22. 
35 [0032] The relationship coefficients for a keyword are first initialized depending on the number of data items indexed 
by the keyword. It is considered that the more data items one keyword indexes, the less relevant it should initially be to 
describe the data items. Thus v k ° can take the form: 



15 



v ik 



User . 
GC . ' 



For practical reasons it is undesirable for the initial relationship coefficient to be too high, so 2 is added to n k in the for- 
mula. This is arbitrary and different values could be used here. 
45 [0033] For each part of the curve (Part 1 and Part 2, Figure 7) there is a respective increasing coefficient and a 
decreasing coefficient. These coefficients, as well as the initial relationship coefficient, are specific to each keyword. 

Part 2 of the curve: when v > x& 

so [0034] If f is the function describing this part, with the two initial conditions f{0) = v H ° and the slope f'(0) = s , the 
following expression for f is obtained: 

f(xhv° k exp(^} 

55 W ' 



Increasing coefficient equation (calculated from /(x+1)): 
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5 Decreasing coefficient equation (calculated from /(x-1 /n ft )): 

"i - C<?j e • with = Cg|c - exp f-^-g-l 

;o 

Part 1 of the curve: when v < \M 

If g is the function describing this part, with the two initial conditions g(0) = v° and the slope g'ifi) = s. the following 
is expression for g is obtained: a 

S(x) = (v° + 1)expf-f^V| 
Increasing coefficient equation (calculated from o(x+1)): 



25 



30 



35 



"* = • (v*+ 1)- 1 with =expf-^-l 



Decreasing coefficient equation (calculated from g{x-Vn k )): 



"* = C<& c • (^ + 1)-1 with cg» c =expf 1 1 



In Figure 7 the x axis represents the number N of associations between a keyword and a data item. The exact deriva- 
tion of N is: 



40 * 



where AT represents the number of times the association [i,k\ was made and tV the number of times the associations 
\jM] were made for j * i. 

[0035] The User and GC relationship coefficients are derived as follows: 

<5 User relationship coefficient v k User 

This represents the value of v taken after N User associations have occurred, without any decrease, since the asso- 
ciation was created (i.e. since v = v ). 



50 
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GC relationship coefficient v k GC 

This represents the value of v that would take the value of the initial relationship coefficient v°, after N rr negative 
interactions have occurred, without any decrease. 



BNSDOClD:<EP 0938053A1 I > 



7 



i 



EP 0 938 053 A1 




[00361 The coefficient n k counts associations with the keyword k that are active, which means that the other coeffi- 
cients vu° CW C< 2 > Ci 1 > and Cif> only relate to the active keywords. When the status of a keyword k in an asso- 
ciation changed th'e*value of n k must be increased (candidate keyword to user keyword) or decreased (user keyword 
to candidate for GC keyword). Then all the other coefficients must be recalculated. 

[00371 If a new association is made with a new keyword k, the system should initialise the value of n k to 1 and then 
calculate v k °. C J} > , C {*> . C flj. and C Jg. . Then the association is created between the data item and the keyword, and 

v ik gets the relationship coefficient value v k °. „ ^. 

[0038] When a new association is made but with a keyword k already in the system (i.e. indexing other data items), 
this new association is created and its relationship coefficient is initialised to v k °. As it is not an active keyword for that 
data item yet, it is not counted in n k , so the other coefficients are not yet recalculated. 
[0039] Some example scenarios incorporating the present invention will now be briefly described: 

1 ) Indexing new data items: a batch of photographs has to be added to the image repository and there exists a com- 
munity of picture indexers (these could be the users browsing the collection). Each indexer is given a randomly 
selected picture from the new batch and is asked to provide the keywords. These are added to the keyword set 
indexing the picture or if the keyword is already present the counter associated with it is incremented. Provided that 
users agree on a subset of keywords for a given picture, these would eventually emerge with the higher score. 

2) Searching an indexed collection of data items: the collection of photographs is being searched with keywords by 
a large user community. The candidate photographs selected in accordance with the keywords are shown in 
thumbnail form, and once a thumbnail image is selected for viewing of the complete version at full size and resolu- 
tion the search keywords modify existing indices by a small factor (learning rate). If the user subsequently chooses 
to purchase a copy of the image, the association of the search keywords with the image may accordingly be 
strengthened further. One could try to obtain the thesaurus automatically assuming that two subsequently entered 
keywords are related semantical^ (assuming keywords are nouns only). This is a very weak assumption and most 
of the pairs would constitute "noise" (i.e. they have a very small probability of being entered by another user) but 
consistently entered keyword pairs would emerge through the scoring procedure. 

[0040] Many applications for adaptive indexing exist. The World Wide Web provides one particularly attractive oppor- 
tunity since its user community is huge and diverse. People use the Web to search for information of any type and are 
sensitive to delays in the search so the quality of indexing is very important. The Web also changes very quickly as tech- 
nologies evolve- there is a need for a maximum of dynamism as well as availability of the information. 
[0041 ] Adaptive indexing might also be very useful for smaller user communities. A corporate user community can for 
instance train the search tool to use their own specialized vocabulary. Since the indexing is adaptive, the indexes can 
be specific or dedicated to a particular area. 

[0042] This system would be extremely useful for image libraries, since automatic tools to index images are very dif- 
ficult to produce The way that we describe an image is also dependent on what we take into account in the image: it 
may be the elements which go into its composition, or the emotion that it provokes, for instance. An adaptive indexing 
tool will build a set of descriptors which reflect what the majority of people searching the image library think about an 
image, making it easily retrievable by this majority. 

[0043] On one hand therefore this technique may be used to index the Web and make a data item easily reachable 
by a majority of people searching for it. and on the other hand it allows the use of a very restricted vocabulary for index- 
ing in a small user community with rigid rules. The system adapts itself to the environment and can be moved smoothly 
from one environment to another. 

[0044] In fact this system tries to capture real life perception of objects in the environment. We all have different ways 
of describing something but the description that is most often used in the appropriate community can be considered as 
the democratic description. Thus an adaptive indexing system can act as a repository for human knowledge, and taking 
"snapshots" of the state of the system from time to time could allow cultures to be compared over time. 
[0045] Although the system can adapt its indexes automatically without intervention, there is also the possibility that 
a manager of the system can set some parameters in accordance with the search capabilities needed: 

- modification of the number of descriptors for the data. By modifying the threshold of the minimum weight or the total 
number of descriptors allowed, one can decide who wide the vocabulary will be. 

- modification of the amplitude of the weight: the manager can choose whether it is appropriate to have weights 
which are very close together or far apart. This has to do with the strategy for training the system when a new 
vocabulary has to be built up. such as in the initial phase, or just after some event such as change of user commu- 
nity which is likely to bring in a significant number of new descriptors and make some of the old descriptors obso- 
lete We could start with weights close together so that links can be made easily between pieces of data and new 
descriptors, and later make the weights further apart as the vocabulary for the description stabilizes. 
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Claims 



1 . A method of refining descriptors associated with data items to enable retrieval thereof, comprising the steps of: 

5 storing said data items in both a summary form and in a complete form; 

receiving a search request from a user for selection of data items, said request incorporating at least one 
descriptor; 

sending the user a search result comprising only the summary form of data items selected in accordance with 
said search request; and 

w using a response of the user requesting the complete form of a selected data item in the search result to guide 

modifications of the descriptors. 

2. The method of claim 1 , wherein the users* responses comprise explicit comments supplied by the users indicative 
of the utility of search results. 
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The method of claim 1 or claim 2, wherein the data items are any one or more of images, text, audio records or 
video records. 



4. The method of claim 3, wherein the data items are images and the summary forms comprise thumbnail versions of 
20 the images. 

5. The method of claim 1, wherein a further response of the user in selecting a further action in respect of the com- 
plete form of a data item after the complete form has been provided is also used to guide modifications of the 
descriptors. 

25 

6. The method of claim 1 . wherein association between a data item and a descriptor has a weight indicating strength 
of that association. 

7. The method of claim 6, wherein modification of the descriptors in accordance with user response includes modifi- 
30 cation of the weight of association. 

8. The method of claim 6 or claim 7, wherein an association between a data item and a descriptor is assigned a first 
weight upon an initial occurrence of that data item with that descriptor, the weight is increased upon subsequent 
occurrences of that data item with that descriptor, and the weight is decreased upon subsequent occurrences of 

35 that data item without that descriptor. 

9. The method of claim 8, wherein the descriptor becomes usable for retrieval of the data item when the weight 
reaches a first predetermined threshold, and the descriptor becomes no longer usable for retrieval of that data item 
if the weight falls to a second predetermined threshold. 

40 

10. The method of claim 8 or claim 9. wherein change in weight for successive said occurrences is determined in 
accordance with a exponential function. 
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